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Abstract 

We investigate the estimation of a weighted density taking the form g = w(F)f, where / denotes 
an unknown density, F the associated distribution function and w is a known (non-negative) weight. 
Such a class encompasses many examples, including those arising in order statistics or when g is related 
to the maximum or the minimum of N (random or fixed) independent and identically distributed 
(i.i.d.) random variables. We here construct a new adaptive non-parametric estimator for g based 
on a plug-in approach and the wavelets methodology. For a wide class of models, we prove that it 
attains fast rates of convergence under the JL P risk with p > 1 (not only for p = 2 corresponding to 
the mean integrated squared error) over Besov balls. The theoretical findings are illustrated through 
several simulations. 

Key words and phrases: Weighted density, density estimation, plug-in approach, wavelets, block 
thresholding, reliability, series system, parallel system. 

1 Introduction 

1.1 Problem statement 

Let (f2, A, P) be a probability space, X be a real random variable with unknown density / and Y be a 
random variable having the unknown weighted density 

g(x) = w(F(x))f(x), xeR, (1.1) 

where w denotes a known (of course non-negative) weight and F denotes the distribution function of /. 
The goal we pursue here is to estimate g from a random number N of i.i.d. sample X%, . . . , Xn of X. 

Such an estimation problem arises in many situations, typically when g is related to the maximum 1 of 
N i.i.d. random variables, where N is a discrete random number in IN* which is independent of the AVs. 
Application fields cover hydrology, meteorology, reliability, investment, management science, insurance 
business, etc.. For example, when the Xi are non-negative, the random variable Y = max(A"i, . . . , Xn) 
(or Y — mm(Xi, . . . , Xn)) arises naturally in reliability theory as the lifetime of a parallel (series) system 
with a random number N of identical components with lifetimes X\ , . . . , Xn . 

To make things clearer to the reader, we next give some illustrative examples. 

1.2 Motivating examples 

Example 1.1 (Order statistics). Let X%, . . . ,X m be i.i.d. random variables with absolutely continuous 
distribution function F and probability density function (pdf) /. Let < . . . < Xr m ) denote the 

corresponding order statistics. Then, the pdf gx(j) of the j-th order statistic is 

777 1 . 

9x(j)(x) = .. uu -AF(x)y-\l - F{x)) m -if{x), x e R. 

Thus, Xr m \, for example, is the random variable representing the largest observation of a sample of n 
and corresponds to the sample maximum and the density gx(m) of Xr m ) — max(A"i, . . . , X m ) is given by 

9x( m )( x ) = m{F{x)) m ^ 1 f(x), xER. 
1 Since min(Xi, . . . , Xjv) = — max(— X\ } . . . , — Xjv) the results can be easily reformulated for the sample minimum. 
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The aim is to estimate gx(j) from a m i.i.d. sample X\, . . . , X rn of X. 

Example 1.2 (Maximum of a random number N of i.i.d. random variables). Let X be a random 
variable with density /, {X„} ne H« be a sequence of i.i.d. random variables with density / and TV be a 
discrete random variable taking values in IN* with a known probability mass function. Then the density 
of Y = max(Xi, . . . , X^) is 

g(x) = w(F(x))f(x), (1.2) 

where 

oo 

w(u) = ^Tku k - 1 'P(N = k), it e [0,1]. 

k=l 

The goal is again to estimate g from an n i.i.d. sample A 1; . . . , X n of X . 

Example 1.3 (Pile-up model). Let us now present the "pile-up model". Let {l^} n gM* be a sequence of 
i.i.d. random variables with density g, N be a discrete random variable in IN* as in the previous example, 
and let X — min(Yi, . . . , Y)v) with density /. Then the density of Y\ is 

g(x) = w(F(x))f(x), x e R, 

where 

W{U)= W{M-\l-u)Y UG[0 ' 1] ' 

M(u) = W,(u N ) and M'(u) — ~Wi{Nu N ~ l ). We are seeking an estimate of g from a n i.i.d. sample 
X u ...,X n oiX. 



1.3 Previous work 

Some distributional properties of the maximum and minimum of random variables have been extensively 
studied in the literature (see, e.g., Raghunandanan and Patil (1972), Shaked (1975) and Shaked and 
Wong (1997)). In addition, the literature on order statistics contains a huge work about the maximum. 
In the context of extreme value theory, various statistical properties and (real data) applications can be 
found in Adamidis et al. (2005), Louzada et al. (2012) and the references therein. 

The estimation of the density function of the maximum of two independent random variables has 
been considered by Chen and Hsu (2004) via kernel methods. The Pile-up model has been considered by 
Comte and Rebafka (2010) via model selection methods. 



1.4 Contributions and relation to prior work 

In this paper, we develop a new non-linear adaptive estimator for g in model (1.1) based on a plug-in 
method, wavelets and the block thresholding rule introduced by Cai (1999). Wavelet-based thresholding 
estimators are attractive for non-parametric function estimation because of their virtues from the view- 
points of spatial adaptivity, computational efficiency and asymptotic optimality properties. In the case of 
simple density estimation, wavelet thresholding is probably one of the most attractive nonlinear methods. 
We refer to e.g., Antoniadis (1997), Hardle et al. (1998) and Vidakovic (1999) for a detailed discussion 
of the performances of wavelet estimators and some of their advantages over traditional methods such as 
kernel-based or projection estimators. 

We here explore the theoretical performance of our estimator under the L p risk with p > 1 over a very 
rich class of function spaces, namely Besov spaces. Sharp rates of convergence are obtained. Application 
of our theory on Example 1.2 above is described in detail. Finally, numerical experiments are carried out 
to illustrate the practical performance of our estimator. In particular, the numerical tests indicate that 
the block thresholding estimators compare very favorably to standard kernel methods. 



1.5 Paper organization 

The paper is structured as follows. Our wavelet estimator is described in Section 2. Section 3 presents 
our theoretical results. Simulations are detailed in Section 4. The proofs are postponed to Section 5. 



2 Wavelet estimators 

First of all, we briefly recall some key facts on wavelets and Besov spaces that will be essential to us in 
the sequel. Then we develop our nonlinear adaptive wavelet block thresholding estimator. 
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2.1 Wavelets and Besov balls 

Let b > 0, p > and h p ([-b, b}) = \h : [-b, b] -> R; \\h\\P = $\ \h(x)\^dx < +oo\. 

For the purposes of this paper, we use compactly supported wavelet bases on [—b, b]. More precisely, 
we consider the Daubechics family db2H with the scaling and wavelet functions cj> and ip, where R > 2 is 
a fixed integer, see e.g., Mallat (2009). Define the scaled and translated version of the <j> and t/j 

<t> jt k(x) = y/ 2 (j){2 3 x - k), ipj.k(x) = 2 i/2 ip(2 j x - k). 

Then there exists an integer r and a set of consecutive integers Aj such that Card(Aj) = C2 J for a C > 
and, for any integer I > r, the collection 

B = fceA £ ; Vi.it; j€W-{0,...,l-l}, fee A,}, 

is an orthonormal basis of 1*2 ([— b, b]). 

Consequently, for any integer i > r, any ft, G L2Q— &]) can be expanded on B as 

oo 

h{x) = E a Lk<t>Lk(x) + E E P],k1pj,k{x), 
keA e j=£ keAj 

where 

«£,fc = / h(x)(f>i t k(x)dx, Pj^k = / h(x)^j.k(x)dx. (2.1) 



-6 J-b 



As is traditional in the wavelet estimation literature, we will investigate the performance of our 
estimator by assuming that the unknown density / belongs to a Besov ball. The Besov norm for a 
function can be related to a sequence space norm on its wavelet coefficients. More precisely, let M > 0, 
s e (0, R), q > 1 and r > 1. A function h e JL P ([— 6, 6]) belongs to B s q r {M) if and only if there exists a 
constant M* > (depending on M) such that the associated wavelet coefficients (2.1) satisfy 



1/9 

\keA T / V 3=r \ \fceAj 



E + E 2^+v 2 -i/,) e i w 




with the usual modifications if q = oo or r — oo . 

In this expression, s is a smoothness parameter and g and r are norm parameters. They include many 
traditional smoothness spaces such as Holder and Sobolev spaces. A comprehensive account on Besov 
spaces can be found in e.g., Devore and Popov (1988); Meyer (1992); Hardle et al. (1998). 



2.2 Plug-in block wavelet estimator 

Let us consider the general statistical framework described in Section 1 with i. i. d. sample X\ , • • • , X n . 
First of all, we investigate the estimation of / via the so-called wavelet block hard thresholding estimator. 
We suppose that supp(/) C [—6, b] with b > 0. 

Let p > 1, and j\ and j 2 be the integers corresponding to the finest and coarsest scales defined as 

ji = L(max(p, 2)/2) log 2 (logn)J , j 2 = [\og 2 (n/ log?i)J , 

where [aj denotes the whole number part of a € R + . For any j € {ji, . . . ,j 2 }, let Aj and Uj t x be given 
such that UneAV^K = A,-, \U jtK \ = L = [(log n )max( P ,2)/2j and V , R n V . K> = for any ^ R , with 
AT, A' 6 Aj. In a nutshell, at each scale j, each is the set containing position indices of the wavelet 
coefficients inside block K E Aj. 



We define the wavelet block hard thresholding estimator of / by 
f(x) = E^^W+E E E A'.* 1 //- ,5 wl^W' are [-&,&], 



keA jl j=ji KeAj keUj, K 



(2.2) 
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where denotes the indicator function, and 

^ n 1 n 

i=l i=l 

and k denotes a large enough constant. 

The estimator / was initially developed by Cai (1999) for the regression model under the L2 risk with 
equispaced deterministic samples. The L p risk version was studied in Picard and Tribouley (2000) for the 
standard density estimation problem and by Chesneau (2010) for the biased density estimation problem. 

The idea underlying / (2.2) is to operate a group/block selection: it keeps intact the large groups 
of unknown wavelet coefficients of / (2.1) and removes the others. Wavelet block thresholding is one 
of the most attractive non-linear thresholding methods, since it is both numerically straightforward to 
implement and asymptotically optimal for a large variety of Sobolev or Besov classes. Detailed references 
on the subject for various models include, but are not limited to, Cai (1999, 2002), Li and Xiao (2008); 
Li (2008), Picard and Tribouley (2000) and Chesneau (2008, 2010). 

Finally, plugging (2.2) into (1.1) leads to the following estimator of g: 

g(x) = w(F(x))f(x), xe[-b,b], (2.3) 

where 

1 - 

1=1 

The rest of the paper explores the theoretical and practical performances of g. 



3 Main results 



In this section, we discuss the asymptotic properties of the proposed estimator. Rates of L p convergence 
are investigated under the following assumptions: 

(A.l) Compact support: supp(/) C [—6, b] with b > 0. 

(A. 2) Uniform boundedness: there exists a constant C\ > such that 

sup f{x) < d . (3.1) 

x£[-b,b] 

(A. 3) Uniform Lipschitz continuity of w (with Lipschitz constant C2): 

\w(u) -w(v)\ < G 2 \u-v\, for all {u,v) £ [0,1] 2 , (3.2) 

3.1 Estimator convergence rates 

Theorem 3.1 studies the L p risk of <? (2.3) over Besov balls and Assumptions A.1-A.3 on / and w. 

Theorem 3.1. Consider the general statistical framework described in Section 1 (the estimation of g 
(1.1) is of interest). Suppose that Assumptions A.1-A.3 hold. Let p > 1 and'g be given by (2.3). Then, 
for any f £ B^ r (M), q > 1, r > 1 and s £ (l/p,R), there exists a constant C > such that 



e (lis - g\\l) <cv„, p , 



wht 



f n -sp/(2s+l) 



= < 



logn 

n 
log n 

n 
logn 

n 



sp/(2*+l) 



(s-l/q+l/p)p/(2(s-l/q) + l) 



if q>p, 

if {P > Q, qs > (p-q)/2}, 

if qs <{p- q)/2 or {qs = (p - q)/2, p < q/r}, 



(s-l/g+l/p)p/(2(s-l/g)+l) 



(logn)P-«/ r ) if {qs = (p - q)/2, p > q/r}. 



(3.3) 
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Note that the rate (p n „ in (3.3) is the near optimal (or optimal in some regimes) one in the minimax 
sense for /. See, e.g., Donoho et al. (1996) and Hardle et al. (1998). In plain words, the near optimality 
of the estimator of / is transferred to that of g through the plug-in principle. 

The proof of Theorem 3.1 uses a suitable decomposition of the L p risk and capitalizes on results on 
the performances of / (2.2) and F (2.4) established in Chesneau (2010). See Section 5 for details. 

3.2 An illustrative application 

Let's recall Example 1.2, where {X n } ne ^* is a sequence of i.i.d. random variables with pdf / and 
N be a discrete random variable of values in IN* independent of this sequence. The density of Y = 
max(Xi, . . . ,Xn) is given by (1.2). 

Suppose that Assumptions A.1-A.2 hold. Thus, several examples for the distribution of N can be 
considered. 

(a) Degenerate distribution. ¥(N = m) = 1. Then 

w(u)=mu m -\ we [0,1], (3.4) 

(b) Geometric distribution. N ~ G(rf) (P(iV = k) = r?(l - J7) fc- \ k G IN*). Then 

w ( u ) = (1-^(1-^)2 ' u e [0,1], (3.5) 

(c) Poisson plus 1 distribution. N = P + 1 with P ~ V(X) (P(N = k) = e~ x tctjj, k G IN*). Then 

w (u) = e- x e Xu (l + Xu), it G [0,1]. 

Remark 3.2. In examples (a)-(c) above it is clear that Assumption 3 is satisfied; more precisely, in 
example (a), we have C 2 = m(m — 1); in example (b), we have C 2 = 2(1 — r))/r) 2 ; in example (c), we have 
C 2 = \{2 + \). 

In this context, Theorem 3.1 can be applied. Let p > 1 and g be the estimator given in (2.3). Then, 
for any / G B* r (M), q > 1, r > 1 and s £ (1/p, R) there exists a constant C > such that 

K{\\g-g\\l)<Cipn, P , 

where ip n . P is given by (3.3). 

Remark 3.3. Taking m — 2, the obtained rate is similar to the one attained by the kernel estimators 
developed by Chen and Hsu (2004); the only difference is the extra-logarithmic term (\ogn) 2s ^ 2s+1 \ 
However, unlike kernel estimators Chen and Hsu (2004), our procedure g is adaptive and our rate of 
convergence holds for a wider class of functions / including Holder class, Sobolev class, etc.. 

4 Simulation results 

We now illustrate these theoretical results by a simulation study within the context described in Section 
3.2. That is, we consider the problem of estimating the density g of the maximum of a random number 
N of i.i.d. random variables. From a reliability study standpoint, this problem corresponds to a parallel 
system with N identical components. Thereby, we have considered two numerical examples. They 
complement the asymptotic results of 3.1. 

Computational aspects. In the sequel, we will refer to our adaptive wavelet estimator (2.3) simply as 
Block. We have compared its performance to alternatives from the literature on several densities. We have 
considered the uniform distribution, as well as a family of normal mixture densities ("SeparatedBimodal", 
"Kurtotic" and "StronglySkewed", initially introduced in Marron and Wand (1992)) representing different 
degrees of smoothness (see Figure 1). We assumed that the density function / of the Xj's has a compact 
support included in [—6,6]. We have used the formulae given by Marron and Wand (1992) to simulate 
such densities so that 

min(/i( — 3(7;) = —3, max(/^ + 3ai) = 3, 
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Uniform SeparatedBimodal Kurtotic StronglySkewed 




Figure 1: Test densities. 
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(a) (b) (c) (d) 

Figure 2: Typical reconstructions from a single simulation with n = 1000 for the Kurtotic density. The 
dashed line depicts the original density and the solid one depicts its wavelet block estimate, (a): f(x). 
(b): F(x). (c): w(F(x)). (d): g(x) = w(F(x))f(x) 




10° ^ 10° ^ 10° ^ 10° 



(a) Uniform (b) SeparatedBimodal (c) Kurtotic (d) StronglySkewed 

Figure 3: The influence of p in the numerical values of the Jj p risk (in log- log scale) of Block (solid) and 
term- by-term (dashed) thresholding (L = 1). 

where I = 1, . . . , d with d the number of densities in the mixture (see, (Marron and Wand, 1992, Section 
4, Table 1), for the values of the parameters). Thereby, it is very unlikely to have values outside the 
interval [—4, 4] and we loose little by assuming compact support . The same kind of assumption was made 
in the context of wavelet density estimation by Vannucci and Vidakovic (1997). In order to simplify the 
presentation of the results, one can simply rescale the data such that they fall into [~b, b] (which covered 
the full range of all observed data). Thus, the density was evaluated at T — 2 J equispaced points 
ti = 2ib/T, i = — T/2, . . . , T/2 — 1 between —b and b , where J is the index of the highest resolution level 
and T is the number of discretization points. The primary level ji = 3, T = 512 and the Symmlet wavelet 
with 6 vanishing moments were used throughout all experiments. All simulations have been implemented 
under Matlab. 

Results and discussion. In order to illustrate Theorem 3.1, we study the influence of p on the numeri- 
cal performances of the Block and the term-by-term (L = 1) thresholding estimator. Let us first consider 
a parallel system with m = 2 identical independent components. Then, the corresponding weighted 
function is (3.4) and the goal is to estimate g in (1.2) from X±, . . . , X n sample simulated from one of the 
test densities. A typical example of estimation for the Kurtotic density (for p = 2), with n = 1000 is 
given in Figure 2. The mean Ij p risk of g i.e., R p (g,g) — (1/T) l#(^) — 9{U)\ P , is obtained with 

10 samples for n = 1000, and it is plotted as a function of p in Figure 3. As predicted by Theorem 3.1, 
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(c) 

Figure 4: (a): Density estimates, (b): Graph of the the LSCV function versus the kernel bandwidth h for 
each of the tested densities, the vertical dashed lines represent the value of h that minimizes LSCV(/i). (c): 
The solid line depicts the estimated MISE as a function of h, the vertical dashed-dotted lines represent 
the true MISE-minimizing bandwidth /imise and the vertical dotted lines represent the pilot bandwidth 

^ROT- 



the larger p, the smaller E p risk of g. We can see that our estimation procedure provides better results 
than the term- by-term thresholding (L — 1) in all cases. In particular, the risk improvement achieved by 
the block estimators upon the term-by-term estimator is significant for the non-smooth Uniform density. 
This is in agreement with the predictions of our theoretical findings. 

In our second example, the adaptive estimator described in Section 3.2 is tested when TV follows a 
Geometric distribution, so that the weight function is that given by (3.5). This example is devoted to a 
simulation study comparing the performance of the block hard thresholding estimator with that of the 
traditional kernel defined as follows 

i=l v 7 

where the positive kernel K satisfies J K(x)dx — 1 and the smoothing parameter h is known as the 
bandwidth. 

Many procedures of bandwidth selection for kernel density estimation have been developed in the 
literature (see, e.g., Silverman (1986)). We use least-squares cross-validation (LSCV) (Rudemo (1982), 
Bowman (1984)) where the bandwidth is defined as 

/b n 
f h (x) 2 dx - 2n~ 1 V] f-i(Xi), 

and /_i is the leave-one out kernel estimator constructed from the data without the observation JQ. It 
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is motivated by the fact that for independent data 

/b n 
f h (x) 2 dx-2n- 1 Y / f- l (X l ) 
b i=i 

is an unbiased estimator of MISE(ft) = J^ b f 2 {x)dx. One frequently used cross-validation (CV) procedure 
is the K-fold CV (as described e.g. in (Hastie et al., 2009, Section 7.10)) in which the data set X\, . . . , X n 
is randomly partitioned into K approximately equal-sized and non-overlapping subsets SI, ... , Sk- To 
obtain the bandwidths /ilscv, we have performed a 10-fold CV, using a Gaussian kernel, with a simple 
"rule-of-thumb" pilot bandwidth /irot- Figure 4(b) contains a plot of the LSCV function versus the 
kernel bandwidth h and Figure 4(c) the estimated MISE as a function of h. For each density, it is clear 
from this figure (Figure 4(b)), that the value of /ilscv is the unambiguous minimizer of LSCV(h). We see 
that /ilscv provides a decent approximation, close to /imise for all test densities. For the StronglySkewed 
density, the bandwidth which minimizes MISE(/i) in this case is /imise = 0.039 and /ilscv = 0.044. In 
this case, for the Uniform density, /imise = ^lscv = 0.051. 

We then compared the performance of the Block estimator g with that of the plug-in kernel estimator, 
say ?lscv, given by g L scv = w(F(x))f LSCV (x), where F is defined by (2.4) and /lscv is given by (4.1) 
with /ilscv- Figure 5 shows the results of g and <7lscv f° r N ~ G(rj), with 77 = 0.9, 77 = 0.5 and 77 = 0.1 
respectively. Table 1 presents the MISE for samples sizes n = 1000, 2000 and 5000. For virtually all 
cases, the Block estimator consistently showed lower E2 risk than <7lscVi with the exception of the (very 
smooth) SeparatedBimodal density for which the kernel estimator performs slightly better. This comes 
at no surprise given that this density is very smooth. Additionally, small discrepancies in the estimate of 
/ may lead to substantial discrepancies for the estimate of g at the locations overweighted by w(F(-)). 
It turns out that this is the case for the Geometric distribution where the weights evolve in 0(1/77) at 
high values of x, and thus the discrepancies in 'g increase as 77 gets smaller. However, the kernel estimator 
5lscv seems to be more concerned (see, Figure 5(c)), confirming that Block generally provides a better 
estimate of /. Furthermore, as expected, for both methods, and in all cases, the MISE is decreasing as 
the sample size increases. Without any prior smoothness knowledge on the unknown density, the Block 
estimator provides very competitive results in comparison to <?lscv- 



Table 1: 1000 x MISE values from 50 replications of sample sizes n = 1000, 2000 and 5000, when N follows 
a Geometric distribution of parameter 77. 



Uniform 


1.0e-03x 




77 = 0.9 






77 = 0.5 






7/ = 0.1 




n 


1000 


2000 


5000 


1000 


2000 


5000 


1000 


2000 


5000 


Block 


10.49 


7.03 


4.18 


11.15 


7.61 


4.85 


19.15 


15.05 


13.39 


Kernel 


13.76 


11.37 


9.32 


17.62 


14.99 


12.60 


70.50 


63.24 


55.94 


SeparatedBimodal 


1.0e-03x 




77 = 0.9 






r, = 0.5 






T] = 0.1 




n 


1000 


2000 


5000 


1000 


2000 


5000 


1000 


2000 


5000 


Block 


8.53 


6.29 


3.69 


9.09 


6.89 


3.90 


15.27 


11.08 


7.12 


Kernel 


6.30 


4.98 


3.54 


7.00 


5.32 


3.82 


13.26 


10.86 


7.79 


Kurtotic 


1.0e-03x 




77 = 0.9 






r) = 0.5 






?/ = 0.1 




n 


1000 


2000 


5000 


1000 


2000 


5000 


1000 


2000 


5000 


Block 


11.31 


8.03 


5.52 


11.93 


8.04 


5.57 


18.08 


9.22 


7.01 


Kernel 


12.27 


8.00 


5.83 


13.03 


8.44 


6.13 


21.97 


13.04 


9.66 


StronglySkewed 


1.0e-03x 




?7 = 0.9 






77 = 0.5 






7/ = 0.1 




n 


1000 


2000 


5000 


1000 


2000 


5000 


1000 


2000 


5000 


Block 


9.91 


7.78 


5.12 


8.69 


7.02 


4.70 


10.12 


9.03 


5.93 


Kernel 


10.57 


8.13 


5.97 


11.16 


8.68 


6.24 


19.62 


15.54 


10.86 
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Figure 5: Original densities (dashed), Block thresholding estimator 7j 
?lscv (solid) (2nd row) from 50 replications of n = 1000 samples X\, . 
SeparatedBimodal, Kurtotic and StronglySkewed. N <~ G(rj), with 

T} = 0.1. 



(solid) (1st row), kernel estimator 
. , X n . From left to right Uniform, 
(a) r) = 0.9, (b) r? = 0.5 and (c) 



10 



F. Navarro, C. Chesneau and M. J. Fadili 



5 Proofs 

Proof of Theorem 3.1. Observe that 

g(x) - g(x) = w(F(x))(f(x) - f(x)) + f(x)(w(F(x)) - w(F(x))). 
By Assumption A. 3 implying sup^Q w(x) < C, together with Assumptions 2, we have 

\g(x) - g(x)\ < C(\f(x) - f(x)\ + \w{F{x)) - w(F(x))\) 
<C(\f(x)-f(x)\ + \F(x)-F(x)\). 

By the Jensen inequality we have 

E(||5 - gV p ) < C(E(||/ - + E(||F - F\\*)). 
It follows from (Chesneau, 2010, Theorem 4.1 with w(x) = 1 = fi) that 

E(||/ - < Op n , p , 



where ip n . P is given by (3.3). 
Now note that 



1 " 

F{x)-F{x) = -Y,Ui{x) 

71 ' 



n 

i=l 



with Ui{x) = l{Xi<x} ~~ F( x )- Since U\(x), . . . ,U n (x) are i.i.d. with E(Ui(x)) — 0, |/7i(a:)| < 2 and 
E((Ui(x)) 2 ) < 4, the Rosenthal inequality (see Rosenthal (1970)) yields 

E(||i? - F||f) < CsupE ((F(x) - F(x)T) < C-^ < C<p n<p . 

Combining the inequalities above, we obtain the desired result i.e., 

n\\9-9\\ p P )<Cy n , p . 

□ 
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